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Abstract 

We describe the design, implementation and use of a mechanism for handling asynchronous 
signals, such as user interrupts, in the New Jersey implementation of Standard ML. Providing 
this kind of mechanism is a necessary requirement for the development of real-world application 
programs. Our mechanism uses first-class continuations to represent the execution state at the 
time at which a signal occurs. It has been used to support pre-emptive scheduling in concurrency 
packages and for forcing break-points in debuggers, as well as for handling user interrupts in 
the SML/NJ interactive environment. 



1 Introduction 

Programs normally receive communication from the outside world via input operations. This method 
of communication is inherently synchronous: there is no way for the outside world to force the 
program to accept communication. But sometimes it is necessary to communicate asynchronously; 
for example, if the user wants to interrupt execution, or if the operating system needs to inform a 
program that its terminal connection has been lost. Most operating systems provide a mechanism 
for asynchronously signaling a program in these situations. For example, on UNIX systems, when 
a user types the break character (e.g., control-Q, the terminal driver sends a SIGINT signal to the 
process attached to the terminal. Under UNIX a program can establish a handler that the operating 
system will call when a given signal occurs. The signal handler is, in effect, a limited co-routine 
of the main program. Most programs use the default signal handlers, but some applications require 
specialized handlers. For example, an editor will save the edited state of a file, when the terminal 
connection is lost (signified by SIGHUP on UNIX). Providing a signal handling mechanism is a 
necessary requirement for implementing programs such as editors. * j 

The SML definition ([MTH90]) includes a weak mechanism for handling asynchronous inter- 
rupts generated from the keyboard (e.g., SIGINT on Unix systems). When the user types the break 
character, the exception Interrupt is raised. This exception is primarily used to force control 
back up to the top-level read-eval-print loop, and it has a number of problems: 

*This work was supported, in part, by the NSF and ONR under NSF grant CCR-85-14862 




• Lack of robustness: it is not possible to write a robust Interrupt handler. Multiple 
SIGINTs can create race conditions in an exception handler that handles Interrupt, and, 
in addition, there is no mechanism for masking interrupts. The race conditions are a direct 
result of using exceptions, which are otherwise a synchronous control-flow mechanism, to 
support asynchronous control-flow. 

• Lack of generality: there is no support for other kinds of signals, such as hang-ups or interval 
timer interrupts. Real-world applications must be able to deal with these situations. 

• Lack of flexibility: there is no way to reliably resume execution at the point where the signal 
occurred. A general mechanism should provide a way to restart after a signal. 

We have designed and implemented a general-purpose signal mechanism for the New Jersey 
implementation of SMLthat addresses these concerns. 

Standard ML of NJ (SML/NJ) is a publicly available implementation of SML developed by 
Dave MacQueen and Andrew Appel, with contributions from a number of other people [AM87 l It 
runs on a variety of UNIX-based systems and has a highly portable run-time system^PP® 19 ^. SML/NJ 
supports first-class continuations as an extension [DHM] . Our mechanism treats asynchronous signals 
as continuation producing operations. A signal handler is a function from continuations to continua- 
tions: it takes take the current continuation at the time of the signal and returns a, possibly different, 
continuation. Because of the problems with asynchronous exceptions, we have chosen to replace 
the use of the exception Interrupt by our mechanism. This means that exceptions are always 
synchronous in SML/NJ, which has advantages for compiling and reasoning about the language. 
This paper describes the mechanism and its implementation, and describes some applications 1 . 

2 Continuations in SML/NJ 

SML/NJ uses a continuation-passing style (CPS) code generator, which produces high quality 
code [AJ89 l The use of CPS in the compiler allows the easy introduction of first-class continuations 
into the language. Unlike in Scheme, continuations are statically typed [DHM 1; they have the 
polymorphic type “ a cont .” There are two built-in operations on continuations: 

val callcc : ('a cont -> 'a) -> 'a 

val throw : 'a cont -> ('a -> 'b) | 

A r cont is the type of a function representing the rest of the program with a formal parameter of 
type r. Continuations are created using callcc (call with current continuation) and are applied 
using throw. A simple example is the expression: 

‘This paper faithfully describes the mechanism as provided in August 1, 1990 release SML/NJ (version 0.62). 
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callcc (fn (k : int cont) => (throw k 5; 6) ) + 7 

The variable k is bound to the int cont that adds 7 to its argument; the throw applies k to 5, 
giving 12. Continuations provide a natural mechanism for implementing co- routines fWan d80] , such 
as signal handlers. 

A subtlety of the continuation mechanism is the interaction between callcc and exceptions. 
Normally, exception handlers are dynamically scoped, but callcc binds the current exception 
handler into the continuation it passes to its argument. To illustrate, consider the following example: 

exception Foo 

val (f : unit -> 'a) = (fn () => raise Foo) 
val (g : unit -> 'a) = throw ( 

callcc (fn k => (callcc (fn k' => throw k k' ) ; raise Foo))) 
fun h x = ( (x ()) handle Foo => 1) 

Applying h to f will produce the value 1; applying it to g will produce an uncaught exception 
Foo. This is because g raises Foo in the exception context in which it was bound, instead of in the 
context of h’s handler. 

3 ML signal handling 

Our approach to supporting asynchronous signals in ML is to view them a continuation producing 
operations. When a signal occurs, the run-time system captures the current ML state as a contin- 
uation, which we call the resumption continuation, and passes it to a signal handler. The signal 
handler returns a new continuation with which the run-time system resumes ML. As with the UNIX 
signal mechanism, the interrupted thread and signal handler are co-routines. 

3.1 System. Signals 

The structure System. Signals in the SML/NJ pervasive environment provides a low-level 
interface to the ML signal handling (see figure 1). There are a number of signals, corresponding 
to a subset of the Unix signals [UNIX] plus a signal generated after garbage collections (SIGGC). A 
signal may be either be ignored or caught. The function set Handler is used to install a handler 
for a given signal, and the function inqHandler is used to get the current handler. A value of 
NONE for the handler in these operations specifies an ignored signal. The function malskSignals 
is used to turn signal masking on and off. This operation is cumulative, so that multiple masking 
operations will nest. 

An ML signal handler has the type: 

(int * unit cont) -> unit cont 
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signature SIGNALS = 
sig 

datatype signal 

= SIGHUP | SIGINT | SIGQUIT I SIGALRM I SIGTERM | SIGURG 
I SIGCHLD | SIGIO I SIGWINCH I SIGUSR1 I SIGUSR2 
I SIGTSTP I SIGCONT (* not yet supported *) 

I SIGGC 

val setHandler : (signal * ((int * unit cont) -> unit cont) option) 

-> unit 

val inqHandler : signal -> ((int * unit cont) -> unit cont) option 
val maskSignals : bool -> unit 
end 



Figure 1: Signature SIGNALS 



It takes a count of pending signals 2 and the interrupted thread’s resumption continuation as argu- 
ments. The signal that is to be handled is not given as an argument, but is implicit in the choice of 
handler called. The return value of the handler is the continuation with which execution is resumed. 
All signals are masked while the handler is executing: if a signal occurs, then it is delayed until the 
handler returns. This is why the handler returns a continuation instead of directly throwing to it. If 
a signal handler raises an exception, instead of returning normally, it is treated as a run-time system 
error, and SML/NJ will exit with an uncaught exception error. 3 . 

In order to write robust signal handlers, there needs to be some mechanism for atomic actions: 
i.e., actions that cannot be interrupted by a signal. We guarantee that signal handlers will execute 
atomically. Signals are masked upon entry to the handler and unmasked when the handler returns. 
Signals that occur during execution of a handler are postponed until they are unmasked. Signals 
may also be masked by calling the maskSignals function with the value true. Note that it 
is not possible to unmask signals during the execution of a signal handler, although one can turn 
masking on. To avoid delaying the servicing of a signal unnecessarily, it is good practice to write 
short and simple signal handlers. If an application requires a more complicated and time consuming 
handler action, then techniques similar to those of section 4. 1 should be used. 

3.2 Exceptions 

The signal handler executes in a different exception handler context than that of the interrupted 
thread. For this reason, the use of callcc to build the return value of the handler requires care. 
For example, consider the following handler: 

^The pending signal count is necessary because delays in the handling of a signal can result in multiple occurrences 
of the signal before being handled. 

3 A higher-level signal interface would presumably implement something more tolerant. 
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fun handler (_, resumek) = 
callcc (fn kl => ( 

callcc (fn k2 => (throw kl k2) ) ; 
doSomething ( ) ; 
throw resumek ())) 

The function doSomethingis called in the exception handler context bound by the outer callcc, 
which is the context in which the handler was called. If doSomething raises an unhandled 
exception, it will cause SML/NJ to terminate with an uncaught exception error. 



3.3 The default handlers 

The structure Signals defines default handlers for some signals. Table 1 lists the signals with 
a short description and the default handler action provided by the structure Signals. Most of 



signal 


default action 


description 


SIGHUP 


quit 


hangup 


SIGINT 


raise Interrupt 


interrupt (e.g., control-C) 


SIGQUIT 


quit 


quit (e.g., control- \) 


SIGALRM 


ignore 


alarm clock (interval timer) 


SIGTERM 


quit 


software termination signal 


SIGURG 


ignore 


urgent condition present on a socket 


SIGCHLD 


ignore 


child status has been changed 


SIGIO 


ignore 


I/O is possible on a descriptor 


SIGWINCH 


ignore 


window changed 


SIGUSR1 


ignore 


user-defined signal 1 


SIGUSR2 


ignore 


user-defined signal 2 


SIGTSTP 


n.a. 


currently unsupported 


SIGCONT 


n.a. 


currently unsupported 


SIGGC 


ignore 


garbage collection 



Table 1: Default signal actions 



these signals correspond to standard UNIX signals [UNIX1 . The signals SIGTSTP and SIGCONT are 
currently unsupported, but will be implemented in the near future. SIGGC is generated following 
every garbage collection. 

3.4 Handling SI GINT 

As we noted above, we implement a non-standard policy for handling user interrupts. We decided on 
this policy after struggling with a version of this mechanism that supported asynchronous exceptions 
(such as Interrupt)l Reppy 9 °l. 

Our approach is to eliminate asynchronous exceptions, including the exception Interrupt. 
Although this means a variance with the definition, it is a fairly minor one. The definition states: 
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If the evaluation of a topdec yields an exception (for instance because of a raise 
expression or external intervention) then the result of executing the program “ topdec 
is the original basis together with the state which is in force when the exception is 
generated. 

([MTH90], page 64) 

We believe that the intent here is to specify a policy with respect to the top-level read-eval-print loop. 
This policy can be easily implemented without the exception I nt e r r upt . The read-eval-print loop 
captures its state at the beginning of each iteration as a continuation. The SIGINT handler throws 
to this continuation, which restores the original basis. By banishing asynchronous exceptions we 
make reasoning about programs that use exceptions more tractable, as well as allowing the compiler 
more latitude in optimizing away exception handlers. 

4 Applications 

The interface provided by structure Signals is low-level, but general enough to provide the basis 
for more sophisticated mechanisms. In this section we illustrate the use of oure mechanism by 
describing a number of applications and programming techniques. 

4.1 Masking signals 

Our mechanism allows all signals to be masked; but there is no mechanism for masking individual 
signals. In this section, we describe a signal handling package that allows a mask to be attached to 
each handler. This mask specifies a set of signals to be masked during the handler’s execution. We 
have the following interface: 

type handler = (int * unit cont) -> unit cont 
type mask = signal list 

val install : (signal * (handler * mask) option) -> handler option 

Install installs a signal handler and associated signal mask, returning the previous handler. 

In order to provide this finer grain masking of signals, the installed signal handlers are run 
outside the context of the handlers provided by System. Signals (recall that those handlers 
mask all signals). When a signal occurs, the ML signal handler sets the handler mask and returns a 
continuation that will call the installed handler. When the installed handler returns, the signal mask 
is reset and control is thrown to the returned continuation. If a masked signal occurs, it is added 
to a list of pending signals. This list is checked when the installed handler returns and, if there are 
pending signals, then the returned continuation is passed to the installed handler of the first signal 
on the pending list. This implementation requires about 100 lines of ML code. 
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4.2 Concurrency 



First-class continuations provide an attractive basis for implementing light-weight threadsf Wand80 l 
The continuations of SML/NJ have been used to implement co-routine packages (e.g. [Reppy89, 
Ramsey90]), but providing pre-emptive scheduling requires additional run-time system support. 
This was one of the major reasons for developing the signal handling mechanism described in this 
paper. 

Shared data-structures, such as the process ready queue, must be accessed atomically with 
respect to process switches. To insure this, we use a simple scheme to mark the beginning and end 
of critical regions (see figure 2) 4 . We assume that threads are represented by unit continuations, 

val atomicLevel = ref 0 
val signalPending = ref false 

fun atomicBegin () = (atomicLevel := (atomicLevel + 1) 
fun atomicEnd () = let val level = (atomicLevel - 1 

if ((signalPending andalso (level = 0)) 
then callcc (fn k => ( 
enqueue k; 

let val next = dequeue () 
in 

atomicLevel := 0; 
signalPending := false; 
throw next ( ) 
end) ) 
else 

atomicLevel := level 

fun alrmHandler (_, k : unit cont) = 
if ((atomicLevel > 0) 

then (signalPending := true; k) 
else (enqueue k; dequeue ()) 



Fi gure 2: Pre-emptive scheduling with atomic regio ns 

and that we have functions enqueue and dequeue to manipulate the queue of ready threads. If 
a signal occurs in an atomic region, then the handler sets the signalPending flag and doesn’t 
switch threads. When the thread leaves the critical region, it will note the pending fault and 
relinquish control. The pre-emptive scheduler is quite simple; if the current thread is in a critical 
region, then the signalPending flag is set and the current thread is resumeb, otherwise the 
current thread is placed in the ready queue and another thread is dispatched. 

^To simplify this presentation, we only concern ourselves with SIGALRM. 
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4.3 Debugging 



Tolmach and Appel have built a replay debugger for SML/NJ [TA90] . They use source-to-source 
transformations in the compilerto instrument the user’s code. The debugger uses a software counter 
to keep track of the current “time.” The instrumentation code increments this counter and compares 
it against a “target” time at important points, called events, in the code (e.g., binding sites and 
function entry-points). When the current time matches the target time, then control is transferred 
to the debugger thread (the debugger and user program are co-routines). One thing that debuggers 
should provide is a way for the user to asynchronously force a break-point in the execution of the 
program. With our signal mechanism, this function is easy to provide. The following handler for 
SIGINT will force a break-point at the next debugger event, by resetting the target time and then 
resuming the user’s thread. 

fun sigintHandler (_, k) = (targetTime := ! currentTime + 1 ; k) 

Of course, a more sophisticated implementation is possible that would allow programs that handle 
signals (such as concurrent programs) to be debugged. The debugger would provide its own 
implementation of the Signal structure, which would intercept interesting signals. 

5 Implementation 

There are two levels to the implementation: the underlying run-time support for mapping UNIX 
signals to ML signals, and the implementation of the structure System . Signals. Following 
a quick introduction to the SML/NJ run-time system, we describe the implementation bottom-up. 
Then there is a discussion of the implementation of robust I/O, followed by some performance 
measurements. 

5.1 The SML/NJ run-time system 

The SML/NJ run-time system is described in [Appel90]), but it has been extensively modified to 
support signals. We describe the relevant features here. 

The SML/NJ compiler uses continuation-passing style (CPS) code generation! AJ89 l Unlike 
other CPS-based compilers ([KKR+86] for example), SML/NJ has no run-time stack", the function 
return continuations are heap allocated. The generated code is also heap allocated, so special tagging 
techniques are used to allow the garbage collector to deal with program counters and code objects. 

Because there is no run-time stack, the SML/NJ code generator uses a register model. The 
ML state consists of a three special-purpose registers and a machine dependent number of root 
registers. Other registers may be used as temporaries, but these are not visible to the run-time 




system. Five of the root registers have special meaning: one is the program counter, one holds the 
current exception handler context, and the other three are used in the standard calling convention. 
There are two kinds of functions: a two-argument function has the standard argument and closure 
registers as parameters; a three-argument function also has the standard return continuation register 
as a parameter. Two-argument functions are functions that never return, such as return continuations 
and the continuations produced by callcc. 

A C structure, called the state vector, is used to hold the ML state in the run-time system. Two 
assembly routines provide the interface between C and ML code. The run-time system invokes 
ML code by loading the state vector and calling rest oreregs, which loads the machine registers 
from the state vector and jumps to the given program counter. When ML code requires a service 
from the run-time system, it loads a global variable request with a request code, and calls 
saveregs, which saves the ML state and returns to the C code that called restoreregs. The 
restoreregs/saveregs protocol is essentially a co-routine switch. 

SML/NJ has a simple, but efficient, semi-generational garbage collector^PP 6189 * 1 . Allocation is 
also fast: it is open-coded and only requires a couple more instructions than object initialization. A 
heap limit check is generated at the entry point of each code tree 5 . If there is not enough free space 
for the maximum possible allocation in the code tree, then a garbage collection trap (GC-trap) 
occurs 6 . The entry-point of a code tree is proceeded by an object descriptor word, thus the program 
counter at the trap point is a valid heap pointer. Handling a GC-trap involves the following steps: 

1. The mn-time routine ghandle catches the trap and records the program counter of the trap 
location in the state vector, sets request to REQ_GC and returns control to the assembly 
coded routine saveregs. 

2. Saveregs saves the ML state in the state vector and passes control up to run_ml. 

3. The garbage collector is then run, using the state vector as the root set. 

4. After garbage collection, run_ml calls restoreregs, which loads the machine registers 
from the state vector, and jumps to the trap location. 

5.2 Run-time support for signals 

The major problem with handling an asynchronous signal is to avoid corrupting the heap. The ML 
signal handler cannot be dispatched immediately, since the current thread may be in the middle of 
an allocation. There are a number of ways to deal with this problem. One approach is to have the 
mn-time system recognize this situation and emulate the instruction stream until the allocation is 

5 A code tree (or extended basic block ) is an acyclic set of blocks with one entry-point and one, or more, exits. 

6 The GC-trap is just a UNIX signal. 
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